211 research outputs found

    Gradient methods for convex minimization: better rates under weaker conditions

    Full text link
    The convergence behavior of gradient methods for minimizing convex differentiable functions is one of the core questions in convex optimization. This paper shows that their well-known complexities can be achieved under conditions weaker than the commonly accepted ones. We relax the common gradient Lipschitz-continuity condition and strong convexity condition to ones that hold only over certain line segments. Specifically, we establish complexities O(RΟ΅)O(\frac{R}{\epsilon}) and O(RΟ΅)O(\sqrt{\frac{R}{\epsilon}}) for the ordinary and accelerate gradient methods, respectively, assuming that βˆ‡f\nabla f is Lipschitz continuous with constant RR over the line segment joining xx and xβˆ’1Rβˆ‡fx-\frac{1}{R}\nabla f for each x\in\dom f. Then we improve them to O(RΞ½log⁑(1Ο΅))O(\frac{R}{\nu}\log(\frac{1}{\epsilon})) and O(RΞ½log⁑(1Ο΅))O(\sqrt{\frac{R}{\nu}}\log(\frac{1}{\epsilon})) for function ff that also satisfies the secant inequality Β β‰₯Ξ½βˆ₯xβˆ’xβˆ—βˆ₯2\ \ge \nu\|x-x^*\|^2 for each x\in \dom f and its projection xβˆ—x^* to the minimizer set of ff. The secant condition is also shown to be necessary for the geometric decay of solution error. Not only are the relaxed conditions met by more functions, the restrictions give smaller RR and larger Ξ½\nu than they are without the restrictions and thus lead to better complexity bounds. We apply these results to sparse optimization and demonstrate a faster algorithm.Comment: 20 pages, 4 figures, typos are corrected, Theorem 2 is ne

    A fast patch-dictionary method for whole image recovery

    Full text link
    Various algorithms have been proposed for dictionary learning. Among those for image processing, many use image patches to form dictionaries. This paper focuses on whole-image recovery from corrupted linear measurements. We address the open issue of representing an image by overlapping patches: the overlapping leads to an excessive number of dictionary coefficients to determine. With very few exceptions, this issue has limited the applications of image-patch methods to the local kind of tasks such as denoising, inpainting, cartoon-texture decomposition, super-resolution, and image deblurring, for which one can process a few patches at a time. Our focus is global imaging tasks such as compressive sensing and medical image recovery, where the whole image is encoded together, making it either impossible or very ineffective to update a few patches at a time. Our strategy is to divide the sparse recovery into multiple subproblems, each of which handles a subset of non-overlapping patches, and then the results of the subproblems are averaged to yield the final recovery. This simple strategy is surprisingly effective in terms of both quality and speed. In addition, we accelerate computation of the learned dictionary by applying a recent block proximal-gradient method, which not only has a lower per-iteration complexity but also takes fewer iterations to converge, compared to the current state-of-the-art. We also establish that our algorithm globally converges to a stationary point. Numerical results on synthetic data demonstrate that our algorithm can recover a more faithful dictionary than two state-of-the-art methods. Combining our whole-image recovery and dictionary-learning methods, we numerically simulate image inpainting, compressive sensing recovery, and deblurring. Our recovery is more faithful than those of a total variation method and a method based on overlapping patches

    Block stochastic gradient iteration for convex and nonconvex optimization

    Full text link
    The stochastic gradient (SG) method can minimize an objective function composed of a large number of differentiable functions, or solve a stochastic optimization problem, to a moderate accuracy. The block coordinate descent/update (BCD) method, on the other hand, handles problems with multiple blocks of variables by updating them one at a time; when the blocks of variables are easier to update individually than together, BCD has a lower per-iteration cost. This paper introduces a method that combines the features of SG and BCD for problems with many components in the objective and with multiple (blocks of) variables. Specifically, a block stochastic gradient (BSG) method is proposed for solving both convex and nonconvex programs. At each iteration, BSG approximates the gradient of the differentiable part of the objective by randomly sampling a small set of data or sampling a few functions from the sum term in the objective, and then, using those samples, it updates all the blocks of variables in either a deterministic or a randomly shuffled order. Its convergence for both convex and nonconvex cases are established in different senses. In the convex case, the proposed method has the same order of convergence rate as the SG method. In the nonconvex case, its convergence is established in terms of the expected violation of a first-order optimality condition. The proposed method was numerically tested on problems including stochastic least squares and logistic regression, which are convex, as well as low-rank tensor recovery and bilinear logistic regression, which are nonconvex
    • …
    corecore